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* Formal means for describing the syntax of code 

switching are proposed and illustrated pith examples from Puerto 
Rican , Spanish and English. The role of code switching constraints in 
determining the way two monolingual grammars may be combined, in 
generating discourse containing code switches is analyzed. 
Intrasententital code switching i s ^characterized as d development 
requiring competence in the two component codes and skill in 
manipulating the codes concurrently. Based on code switches in 
recordings of 20 Puerto Rican bilingual or Spanish-dominant speakers,' 
the distinction between surface and deep code switches, the free 
union of two grammars, a code switching grammar, superscript 
conventions, probabilistic grammars, code switching frequencies and . 
rates, and rule frequencies are discussed. Two linguistic constraints 
of code switching are identified: the free morpheme constraint and 
the equivalence constraint. The performance data 'provided 
quantitative confirmation of the validity of these constraints 
(RW) t fc . 
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position or policy Co d e . sw i tc hing in situations, of language contacts been studied lar- 
gely from the point of view of its social determinants. This paper will 
propose formal means for describing the syntax of code-switching with 
examples from Puerto |Ucan Spanish and English. - fl 

\ n 

<Nl. INTRODUCTION ' 

f\J Among the diverse configurations of linguistic performance in 

^communities where two or more languages are in contact, the alternat- 
•^ing use of different languages within a given situation, or code-switch- 
ing, is a- well-documented pattern. Much progress has been made in sit- 
uating code-switching within a micro-sociological framework or that of 
* the ethnography of speaking, consistent with the goals of understanding 
the interactive purpose, communicative function and social implications 
V* bf this behavior (e.g. Gumpdrz 1964, 1971, 1976; G. Sankoff 1968, 
£ 1972; Denison 1972; Gumperz and Hernandez-Chavez 1970; McClure # 
£? and Wentz 1975; McClure 1977; di Sciullo et al. 1976; Valdes-Fallis 
5 1976, 1978). A relatively small number of studies have focused direct-^- 
^ ly on the grammatical aspects of code-switching (e.g. Hasselmo 1972/ » 
1979; Gingr3s 1974; Lance 1975; Timm 1975, 1978; Pfaff 1975, 197&, 
1979; Wentz 1977; Lipski 1978). 

Complete understanding of code-switching could only be achie- 
ved through combined ethnographic, attitudinal and grammatical study, 

^^tainguistic Research Inc. 1981 ' v g 0031-1251/81/01 03-4* 



4 DAVID SANKOFF & SHANA POPLACK 

i.e. an integrated analysis not- only of when people code-switch but 
. how Where and why. The present paper is but part of such an on-going 
invest.gat.on; though here we concentrate on the purely syntactic as- 
pects of code-switching, we in no way minimize the social determinants 
and implications of this behavior, which previous reports have explored ' 
i n 979b) iUnCt, ° n W ' th ,in9U ' Stic aspects ~( p °f" a <* 1978/ 1979a, 

We distinguish code-switching from other possible dutcpmes of ■ 
language contact situations- such as interference, pidginization borrow- 
ing calquing, language death, ratification, learned use of foreign * 
words, cross-language punning and o*ner word-play, by at Jeast two 
criteria. One is that whereas many of the above involve deformation or 
replacement of parts of the grammar or lexicon of the language(s) invol- 
• ved, code-switching does not. This is one of the basic postulates of 
this paper. Second, unlike other of the above-mentioned phenomena 
wh,eh refer to specialized situations or language functions, what we 
understand by 'code-»wjtching' here is a widely operative norm of com- * V 
munication in certain lypes of multilingual communities (see also 
Sank °ff 1972; Pedraz«uns.). These characteristics of code-switching 
--the structural integrity of the component languages, and its prevalence 
in a broad range of communicative situations -have deep implications 
for grammatical theory. Insofar as discourse is' generally thought of as 
being generated through the coherent pragmatic, semantic and syntac- 
tic mechanisms of a language shared by members of a community, how 
can two distinct languages reconcile their differences in jdch a way as 
to result in discourse involving language switches not only between ut- 
terances, but also within a single sentence? More specifically, how can 
we construct a formal account of the grammatical mechanism whjch 
underlies discourse' containing code-switching? 

t Note that there is no syntactic difficulty involved in alternating 

whole sentences, or larger segments, of different languagesas in (1);this 
practice is common among bilinguals responding to a change in inter- 
locutor, topic or setting (e.g. Weinreich 1953; Gal 1978) 

(1) . dTu eres ateo? J Tu eres ateo? /You're an atneist?] 



<0 
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A FORMAL GRAMMAR FOR CODE-SWITCHING .. 5 

No he's not.? He believes in something. (C.A./44). 

The real problem involves the maintenance of syntactic integri- 
ty of a single sentence containing elements of two or more languages, as 

in (2): / * N 

fir i ■ 1 4 

(2) So you todavla haven't decided lo giie vas a hacer next 
s week. [So-you still haven't decided what you'ret going to 

donextweeHjT(P.A./135) / , 

A series of empirical studies of verbal interaction in one of the 
oldest Puerto Rican communities in the United States (Poplack 1978, 
1979a, 1979b) has coWirm f ed that there are only two general linguistic - 
constraints on where switching may occur: < . 

, a) The free. morpheme constraint: a switch may not occur 
between a bound morpheme and a lexical form unless the latter 
has been phonologically integrated into the language of the 

' bound morpheme. 4 * f 

This excludes switches like (3), in which the phonology of run is unam- 
biguously English, while that of eando is unambiguously 6panish (ancTl 
which in fact.da not occur), but not forms like (4). Indeed, we consid- 
er here phonologically, morphologically and syntactically integrated 
items like the latter to be Spanish forms, and not instances of code- 
switching. v % 

(3) *run - eando [ **n-e'ando] 'running' * *\ 

(4) a flipeando • [flipe'ando] 'flipping' ^ ! , 

b) The equivalence constraint: the ordeHrf sentence con- 
stituents immediately accent to and on both sides of the 
switch point rriust be grammatical. with respect to bqth langu- 
ages involved simultaneously. This requires some specification: 
the local co-grammaticality or equivalence of the two languages 
in the vicinity of the switch holds as long as the order of any 

• ' 4 
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' / ■; " • • ■ • 4 • 

two sentence elements, one before and one after the switch 

poirijt, is not excluded in either language, 
s . ^ i . • - , ' ' 

The equivalence constraint is illustrated in Figure 1, where the dotted 
fines indicate permissible switch points, and the arrows indicate the 
surface relationship of the two languages. Switches may occur at, but 
not between, the dotted I inesr ' 



Eng I ;seen • everything • 'cause I pWt take \ anything. 

B. Sp Yd vi i *~ tpdo T porque yoi no cogf; nada."' 

C. CS i seen everything 'cause | no 



cogi no 



(S.L./1) 



D. Eng He gets [*to him]; a Stomach [jf] ache. 



gets I'tohimJ 

3e unia * un : aoior de barriga. 



vT E. Sp [*el] ^e T?tia 1 un { dolor 




(S.L./4J 



* . Figure 1. Permissible code-switching points. The speaker's 
actual performance is represented in (C), containing 
pne switch, and (E), containing no Switch, 

* %. 

Linguistic performafice constrained in this way mukt be 
based on simultaneous access to the grammatical rules of bptft 
languages. This raises the question of the existence and nature 
of a code-switching grammar. In this paper we describe in for- 
mal terms how the code-switching constraints determine the 
way the two monolingual grammars may be combined in gen- 4 

erating discourse containing code-switches. 

m 0 *■ t 

, • » i * 

' Aside from its purely formal interest, this analysis will illus- 
trate how code-switching, especially intra-sentential code- 
^ switching; rather tharf representing ,a debasement of linguistic 
rrwV skills, as certain prescriptivists claim (e.g. de Granda 1968; 
tjML Varo 1971; LaFontaine 1975), is a development requiring 
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* eompetence in^the two coraponent codes, as well as the ad- 
ditiohal skill to manipulate them concurrently. * > 

2. SOCIOLINGUISTIC CONTEXT AND THE UNVERSALITY OF 
CONSTRAINTS ^ , \ 

Concurrent with the enunciation of the two general code- 
switching constraints-free morpheme "and equivalence-it was shown 
that the more particularistic constraints posited previously, for example' 
that single determiners or subject pronouns cannot be switched (Timm 
1975; Gumperz 1976; Wentz 1977), or any other such restrictions, ard 
not borne out empirically, except* where they are consequences of the 

two general constraints. • l * 

* • / • 

< •> 

However, establishing the status of the free morpheme and 
equivalence, .constraints as universal or near-universal conditions on* 
switching would require much comparative empirical work. Aside from 
the Puerto Rican data, they have been verified for Chicano materials 
published by Valdes-FalJis (1976) and Pfaff (1975,^1976), Swedish- 
English (di Sciullo et al. 1976) code-switching, and In a preliminary 
though quantitative ways on Greek-Engfish/ French-English, Italian- 
English and Yiddish-Spanish-Hebrew data * (D. Tohg, and S, 
Papadopoulos, D. Sheeh/F. Marchese and D. Litvak, New York Univer- 
sity class papers). 

* A . However, ft is not dear how the free morpheme constraint 
might operate in a situation involving English^ and som^ highly in- 
flected or agglutinative language, nor what might be the scope of the 
equivalence constraint ^For languages with highly different word orders. 
To be pertinent, evidence in such cases would depend on establishment 
. of rigorous criteria for (a) distinguishing switches^from borrowing, 
calquing or relexification patterns which may have become part of the 
monolingual norm, (b) identifying possible equivalence constraint vio- 
* lations against a background of information on monolingual word order 
constraints/not based on assumptions about standard languages, but, on 
r n j p ical documentation of dialectal or community usuage, (c) deter- 
LSfe^ig whether code-switching as such is a functional mode of com- 

6 _i . 
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munication within the community, or simply an occasional, artifact of 
interference or' other language contact processes, and (d) -assessing 
individual performance in terms of degree of community membefship, 
degree of L 2 apquisition, and control pf code-switphing mode. 

Aside ^from the question of the validity of the two constraints < 
acros? different multilingual communities, there is also the problem of 
additional constraints which might hold it* specific social contexts. 
For example, in sprue situations involving clearly socially dominant/ 
subordinate pair^of languages, switches may occur only by the insert- 
ton of occasional lexical items from the dominant language into the* 
discourse of the other, but not the reverse- (e.g. Denison 1972, G. 
Sankoff 1972). In the Puerto Rican situation the free morpheme con- 
straint is partially superseded by a stronger- constraint completely 
excluding English inflections on lexical items of Spanish origin, since 
w such items rayely:seem to be phonoli>gically or, semantically integrated 
into the English grammatical system (Pedraza ms.). 

> Another example from/ the Puerto Rican study involves code- 
switching among certain speakers wbosemigrational and educational * : 
history has resulted in their being less fluent in English than in Spanish. 
The equivalence constraint plays little rofe in* this situation; because of 
their limited competence in English syntactic patterns, these speakers 
produce virtually <no intra-sentential 'code-switches. Instead, they 
largely confine themselves to switching to English for sentence tags, 
interjections,- and the occasional single noun in an otherwise entirely 
Spanish sentence (Popjack 1979a). 

Indefed, the validity of any cqde-switching constraint, including 
the free morpheme and equivalence constraints, depends strongly on 
the particular configuration of social factors obtainirig in a given com- 
munity. A typology of the. different patterns of code-switching would 
have to take account of such factory. - ' • 

SCOPE OF THIS'S^UDY , x 

Compared to the extensive literature on the interactional and 
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pragmatic aspects of code-switching, the syntactic aspects have only be^ 
jft gun to be clarified. One of tie problems has been that the syntax , 
^Hnvolved is not easily or convincingly accessible to intuition; switches 
ar^ not readily elicited, and acceptability judgements may be unreliable 
and normatively biased. On the other hand, observation i$ exceedingly' 
difficult, given the precarious balance of situational factors which must' 
' be sustained in order to assure the considerable volume of speech in the 
cdde-switching mode necessary for any statistically valid analysis of 
syntactic patterns. * * * 

* 

One? of the situational factors which may play a crucial role is 
the ethnic identity of the interviewer. As part of a long-term partici- 
pant observation study m East Harlem, Pedro Pedraza collected record- 
ings of Puerto-Rican speech behavior in a variety of settings (Pedraia 
ms.)., It has beefy demonstrated (Poplack 19?8) that the in-group status* 
of the interviewer coupled with relatively unbbtrusive data gaffiering 
techniques yielded a body of code-switching data qualitatively more 
diverse and quantitatively more 4 numerous than that^which could have 
been elicited by an outsider to the community. , ' s ' 

A selection was made of recordings of 20 individuals including 
both balanced bilinguals and speakers who are fluent in Spanish but not 
in English. \Jhe code-switches „ were extracted from these recordings 
with the help of Alicia Pousada, and were analyzed in a previous study 
(Poplack 1979a}. An aim of this paper is to reanalyze these data within 
a formal grammatical framework. Because of the surface nature o^ the 
cdde-switching constraints described in section 1, the formalism we 
adopt is one based on the direct generation of surface phrase structures 
by a context-free grammar. In section 5 we justify our choice of this 
approach rather than an attempt to generate switches in deep structure. 
In order that this analysis be as relevant as possible to the statistical 
generalizations drawn from speech performance data^we discuss how to 
probabilize the monolingual Spanish and English grammars, and the 
code-switching -grammar which results from their combination. 'In a 
~;g ,s minary exercise based on speech samples of a Puerto Rican biling- 
: RIO eaker ' we then calculate the frequencies of the different rules in 
una grammars as well as the relative frequency of the various syntactic 



10 * DAVIb SANKOFF & SHANA POPLACK 

) boundaries" eligible to be the site- of a code-switch. These frequencies 
) of potential' switch sites are then compared with actual switch- frequen- 
cies at these sites compiled Tn the previous study for the sample of 20 
Puerto Rican speakers, to give the relative susceptibility to code-switch- 
ing of each kind of syntactic boundary. The theoretical discussions 
serve as a framework and justification for jkhis analysis of syntactic 
boundaries and their switch propensities, which is the main innovation 
. of v this paper. For the first time, we present actual code-switching 
rates, apdTRese sWow that, the equivalence and free morpheme con- 
straintsUiave implications which go beyond. their qualitative formula- 
tions. 

4. HOW MANY GRAMMARS? 

* 

There has been some debate over whether discourse containing 
code-switches is generated by the alternate use of the two monolingual 
grammars or whether a single code-switching grammar exists, combining 
elements of the monolingual grammars. 2 There are really two quest- 
ions involved, one notationdl or definitional, and one substantive. Any 
finite set of rules afid procedures for generating an infinite set is^ 
grammar, formally speaking, so that any set of rules for construing 
the set of sentences containing code-switches is a grammar.^ / 

Apart from definitions of a grammar, there remains/We more 
important question of whether code-switching involves the Vernation 
frbrn one distinct linguistic system to another, or whether Speakers are - 
exemplifying some integrated competence in the two languages. The 
evidence which seems most pertinent to this issue is the finding that* 
code-switching generally does not entail pauses, hesitations, repetitions, 
corrections or any other interruption or disruptionsm thd rhythm of 
speech (Poplack 1979a). This is distinct from many bifh^gual situations 
marked by language interference, for example, and provides some 
justification for treating code-switched discourse, at least in parts, as 
being generated by a single grammar based on the two monolingual 
i It will be clear, moreover, from the w^ that this grammar must 

^ER^C instructed, that code-switching is not ap result of imperfect com- 
unpaHiceln either oT the two monlingual modes of communication but 
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rather results from knowledge of the rules pf both, their similarities 
and differences; nor do code-switchers suffer loss of competence by * 
virtue of their skill at the code-switching mode. 

One way of avoiding the conceptual problenrls involved- in the 
notion of a code-switching grammar has been to postulate that one or 
*the other of the monolingual grammars is basic to any particular sent- 
ence. But this attitude, embodied in the hypothesis advanced by Wentz 
(1977) and others that every sentence has only one "base" language, 
which can be ascertained by the languages of the determiner and/or 
the verb, does not seem pertinent to the East Harlem situation (nor, 
for that matter, to other published jChicano datah The viewpoint 
that there is an easily identif iabletbase language is associated with the « 
notion that fcode-switches involve f^Jlisertion of isolated L-j elements 
or constituents in otherwise L2 discoursfl^or vice-versa. This may very 
well be the case in certain contexts, such as those described in some of 
the studies cited in section 2/ Indeed, in the previous analysis of the 
Puerto Rican dafta, a method was operational ized to identify ''base 
language'' and "language of the switch". It became clear, howeve^ 
that in many cases this procedure was arbitrary. 

A sketch of the different types of distribution of the two langu- 
ages in coda-switching discourse willJhelp explain why. Such discourse 
may contain a stretch of several sentences clearly identifiable as belong- 
ing to one language (except for occasional words or constituents), 
as in (5). 

(5) 'Cause I believe they're poor, they gotta know how to 
eat everything; not just little desserts and esos potes [those 
jars] which I don't like them. (S.L./9) 

But in other stretches, constituents may oscillate several times frprn one 
language to the other, even within the confines of a single sentence/as 
in (6). IQ 

ERJC ' (6) There was a guy, you know, que [that] he se monto [got 
up] . He started playing with congas, you Know, and se 
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monto y empezb a brincar tgot up and ^rted to jump] 

and all this shit. (P.R./25) t il 

flip 

There is no empirical justification for insisti^that stretches 
like (6) or (17) have one underlying language "with in|pbns from the 
other language. Indeed, no algorithm to determine ''b|»nguage' # so 
far proposed applies. consistently and convincingly to pllpjmance data 
containing multiply switched sentences. What is more existent w j t h 
the data is simply to allow the possibility that in the utt§pj§3 of a sent- 
ence, the rules used to construct its constituents may be drawn at time 
from one monolingual grammar and at times from another, Thus in 
what follows, neither the root S node of a phrase structure\^^ hor the 
NP, VP, etc. nodes, must be identified as to language, though some of 
them necessarily will be. 



Summarizing these considerations, long monolingual stretches 
of discourse may be thought of as being generated by a monolingual 
grammar, but the notion of a code-switching grammar seems /to be 
called for where switches occur with high density. It will be seer&Jhat 
such a grammar may be formalized so as to subsume the two rr?|no- 
Hngual grammars, allowing the entire discourse to be analyzed w a 
uniform framework. $ ■ % 

* ™v 

5. SWITCHES — SURFACE OR DEEP? \; 

The code-switching constraints are constraints on the surface 
syntax of a sentence. There is no empirical evidence that code-switched 
sentences are generated as such in a base component and preserved as 
such through a series of^tfansformations, as suggested, for example, by 
Barkin and Rivas (1979). Indeed, the evidence is against this. Parts of 
. sentences which may be analyzed as having been displaced by move- 
ment transformations are in no way constrained, in real data, to be 6f 
the same language as the elements which may have been adjacent to 
them ii^ cfiftp* structure, but are rather constrained, if at all, by fheir 
^■rface neighbors. tt 

ERIC li ' 

The following example is somewhat of a straw man, since both 
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of its postulates are easily domolished. However, it clearly illustrates 
how a movement transformation operating on a constituent which is 
constrained against being switched in deep structure, implies a clearly 
invalid surface constraint. 

Timm (1975, 1978), Gumperz (1976) and Barkin and Rivas' 
(1979) have all suggested that underlying subject pronouns must be in 
the same language as the verb of a sentence. Thus the code-switch in 
sentence (7) below (as well as one in (6))should be excluded. Were 
passives generated transformationally, sentence (8) would also be 
excluded since its underlying form is of the same type as (7). In fact, 
(8) is not excluded, being typical of attested code-switches involving 
prepositional phrases. 

'(7) ^You ostas dtcifndole la preptin^ in the wrong person. 
[You're asking the question to the wrong person,] 

, (P.A./43) 

(8) La pregunta fue dicha (the question was asked] by you. 

The facts that sentences like (7) are also attested in these data, 
and that passives are not transformationally generated in many current 
analyses, do nof alter our contention that a transformational analysis 
of code-switching will necessarily exclude many well-attested construct' 
iOTs. 4 Conversely, such an analysis might also produce violations^ the 
code-switching constraints by moving items remote In deep structure, 
and hence permitted to be in different languages, to adjacent positions 
on the surface, where they would violate the free morpheme or equiva- 
lence constraints: 

• (9) The car del hombre [of the man], 
but , 
(10) *erhombre's car 

C ,V=REE UNION GRAMMAR 1 & 




Following the considerations of the preceding sections, 
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we Will seek .in the data analysis' totsketch surface grammars for the 
' Spanish and English spoken in our corpus, as well as for the code- 
switching mode. Our goal is obviously not to $Slve all the classical 
problen/s involved m constructing a complete generative description of 
any of the languages involved, but to illustrate how two formal mono-- 
lingual grammars can be combined to produce a grammar of the code- 
swtiching mode. " 

Suppose we have two context-free phrase structure grammars 
G 1„ and G 2 ,or languages L. and L 2 , such that the non-terminal 
grammatical categories of one generally have corresponding categories 
in the (khei^We call this the first translatability condition. In addition, 
we assujne each rule^in G 1 can be functionally translated by at least pne 
rule in G 2 , e.g. the rule S -»VP NP which results in Spanish post-posed 
subjects can always be translated by the English S-*NP VP. This is the. 
second translatability condition. These two translatability conditions 
will generally hold for any two natural languages described within a 
common theoretical framework. 

e 

\ 

The first condition allows us to define'the FREE UNION of the^ 
two grammars consisting of the common set of grammatical categoriJf^ 
the combined set of rewrite rules from G 1 and G 2 , and the combined 
lexicons. The resulting entity is a phrase structure grammar, it is 
context-free, it subsumes the two monolingual grammars, generating 
all sentences in L 1 and L 2 , and it generates all possible sentenUes con- 
taining code-switches. Yet this grammar is of little interest. Not only 
does it generate ^equivalence constraint violations like (11), but it also 
generates ungpam'matical monolingual constituents like (12). 

!"> NP -^DETNADJ (from Spanish) 

DET-^the 

N -» casa „ 
ADJ -*whWts 

f 3 

*the casa white 

er|c ' 
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/ t 

(12) NP -* DET'N ADJ (from Spanish) 



DET -»the 
N house 
ADJ -» white v 

« 

*the house white 

L . Thus the free union grammar of and G2 is not a satisfactory 
code-Switching grammar. "Some mechanism must be found for restrict- 
ing the output of tfie grammar so that the monolingual sentences it gen- 
erates are grammatical according to G-j or d^, and- the bilingual sent- 
ences satisfy the 3 code-switching constraints. One way of doing this 
would be simply to have an output filter which rejected all unsuitable 
sentences. In general, however, the problem of constructing a finite set 
of rules for recognizing ungrammatical sentences, or switches violating 
the constraints, is no less difficult than constructing the entire gram- 
mar, ilfis solution, then, would only be feasible Tor some special pairs 
of very .similar languages where code-switching violations coald be 
'easily recognizable as belonging to some small predetermined set. Fur- 
thermore, this solution not only trivializes the problem of finding the 
structure of the code-switching grammar, bob also results in a grammar 
which is not context-free. Rather, it has some ill-defined, complicated 
structure which is not directly comparable to the monolingual gram- 
mars. 

* ' Having thus rejected the free union grammar, with or without 
output constraints, we are faced with the key task of this paper: 'to 
incorporate the code-switchrfig constraints into the rules of the phrase- 
structure grammar without altering its context-free ^nature. 

The basic problem is that the code-switching constraints are, 
generally speaking, conditions on adjacent constituents, but the essence 
of context-free generation of sentences isWiat the internal structure of 
one constituent does not condition that of another. To solve this 
O em we must ensure that for any two neighboring constttugnts 
ERIC 5 boundary could potentially involve a code-switch violating one 
hi 1 '' oT'Tn e constraints, suitable restrictions must already be coded into the 
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symbol for the grammatical category heading each constituent. And as 
these symbols are rewritten, the restriction information must be passed 
on to f or inherited by f lower level constituents, so that when the term- 
inal grammatical categories are finally lexicalized, the restriction will 
be realized by a compatible choice of language for neighboring lexical 
terms. 

The approach we will ta^e is to introduce superscripts on the 
symbols for the various categories of the grammar, and to restrict the 
^ application of certain rules to symbols with appropriate superscripts. 
These superscripts, will appear only in certain derivations and only at 
certain nodes, arid they will carrj information sufficient to prevent any 
violation of the code-switching constraints, and to permit any code- 
switches which do not violate them. • 

7. A CODE-SWITCHING GRAMMAR 

The code-switching grammar will then be constructed as fol- 
lows. Its lexicon will be the combined lexicon of the two mono-lingual 
grammars. Its grammatical categories will be the grammatical categories 
of and G 2 (most of which they have in common). Each category 
may occur in a (possibly forge, but finite) jjumber of versions, depend- 
ing on the presence of superscripts, as will be explained below. As for 
* the rules of the code-switching grammar, consider first any rule R in 
G v Using the second translatability condition stated above, we cam 
cojnpare R to all its possible translations by rules of G 2 . Suppose for 
any pair of symbols in the output of R, there exists at least one G 2 
translation which does nod reverse the order of the two symbols. Then 
R is included among the rules of the code-switching grammar, again 
possibly in a number of different versions. Rules of,G 2 are similarly 
included in- the code-switching grammar if they satisfy an analogous 
condition. Now, if in the output of the rule R there are two (obliga- 
tory) symbols ordered in a way excluded in all the corresponding G 2 
r rules, the equivalence constraint means thafae must not allow a switch 
0 from L 1 to L 2 after the constituent headecrby the first of these two 
pn^ iecutive symbols, the first of which represents a morpheme bound 
tl^>he second, a switch from L 1 to L 2 must be precluded between the 
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symbols. Likewise, if the second symbol is the bound morpheme, no 
switch from L 2 to L-j may intervene. ^ 

To ensure that such restrictions are obeyed,-the rule R cannot 
be incorporated into the code-switching grammar as is. Rather, the two 
symbols in question must be modified in . the output of R by super- 
scripts which Indicate that the constituents they head are in a strictly 
L<| order. 

8, SUPERSCRIPT CONVENTIONS 

Each superscript will have two components separated by a co- 
lon, the first component indicating a language, the second a terminal 
category (e.g. sp:adj or eng:det). This category and only this category 
will be the one which must.be lexicalized in L-|. ,A simple example 
involves the Spanish rule NP-*DET N ADJ, whose English translation 
is NP -*DET ADJ N (13). Here the superscription the N in the code- 
switching grammar is sp:n , and on the, ADJ it is s P :ad i. In this.cas*the 
superscript means only that when the category N is lexicalized, jt will 
be in Spanish, and similarly for the ADJ. Note that the DET remains 
unsuperscripted, so that it may be lexcialized in Spanish or m English. 

(13) Spanish:; English: 
' NP~* DET N ADJ NP"* DET A^J N 

Code-switching: ^ 
NP-*DET N s P :n ADJ s P :ad i 



This suffices to preclude code-switching constraint Violations like (11) 
and monolingual grammaticality violations like (12). 

To satisfy the free morpheme constraint, it is necessary that any 
• rule generating a Spanish bound morpheme incorporate sp super- 
CD ^. s on this* morphea and on the free morpheme category to which 
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What of rules rewriting high order categories? For the Spanish 
postposed subject rule in (13a) we cannot leave unrestricted further^ 
choice of rewrite and localization rules without risking generating sen- * 
tences like (13b). Nor do we want to be restricted to Spanish only for 
further rules: this would exclude (13c) which is in no way unusuak, 

(139) S-*VP NP . 

\ • 

(13b) *arrived he \ ' . 

j ¥ • t 

(13c) Llego yesterday la mama mta. [My mother arrived 
yesterday.J, , 

Thus the grammatical category component of the superscript 
, must be caref/illy chosen to ensure that the equivalence constraint is 
not-elated, but without putting any other restriction on the string 
being generated. This is done as in (14). 

(14) S-WP^YNPSP- 1 

When* the VP isyewritten, its superscript is transmitted to all symbols in 
the output of the rewrite rule, as in (15) 

*j ■ . • • \ 

(15) VP s P :v -*V s P' v ADVSP™ 

t j * 

When SD t v e VSP:V ' S lexicalized ' lt mu st be in Spanish, Hu\ as for the » 
ADV P- category, since the superscript does not specify s P :atlv 
an adverb may be' chosen fronAither the Spanish or English lexicons-- 

S ° f ' (13C> " SV e,er t0 this as a heritabilit y condition. The transmis- 
sion of the P- v superscript from any. symbol which has it to all the 
symbols which rewrite it is the most general type of heritability con- 
dition. For each rule which rewrites VP, another version must occur 
m the g^mmar with all symbols superscripted s P :v , and the same holds 
fq^any symbol in THEIR outputs which is non-terminal (i.e. is to be 
rewritten), and so on. The only exceptions are: (a) embedded S nodes '< 
-{$ not inherit superscripts, (b) superscripts originating in equivalence 

ERiC 1 ™ 1 " 15 in embedded constituents, or in free morpheme constraints. 
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supersede those from high order equivalence constraints and (c) lexical- 
ization of categories not involved in the superscript is unrestricted as .to 
language. t 

A second type of Irritability condition is exemplified by the 
S P :1 superscript on the NPin (14). Any time a symbol sqperscripted 
this way is rewritten, the superscript must be parsed dn to at least one 
symbol in the output of the^ule. And any terminal grammatical cate- 
gory thus superscripted must be lexicalized in Spanish: Again, embed- 
ded S nodes do not inherit this superscript. The sp:1 superscript 
serves simply to ensure that the NP is not entirely lexicalized in English 
-though there are no empirical grounds for specifying that any PART- 
ICULAR element of th^ NP, even the DET, be in Spanish. 

In another example describing Spanish conjoined noun phrases 
both modified by a shared adjective (16a), the rule must be respecified 
as (16b), so that the CON J and any element of each of the conjoined 
NPs, other than the N, may be switched to English. The sp:n super- 
script is of the same type as the S P ^superscript in (15) and has the 
same heritability condition. 

(16a) Spanish conjoined NP: NP-*NP CONJ NP ADJ 

. (16b) Code-switching: NP->NP s P :n CONJ NP s P :n 

\ f • . . ADJ s P :ad * 

Are any other types of superscripts involving different heri- 
tability conditions necessary? In t|jis present^study we Have not found 
any necessary, bdt this may simply e$ a function of the two languages 
involved, and of the pf^me way the free morpheme and equivalence 
constraints functibn for particular pairs of languages. "Thus our proced- 
uretffor constructing the set of rules in the code-switching grammar may 
have to be modified as different types of non-equivalence are exam- 
ined. 6 The fundamental principle, hovyever, will remain the systematic 
comparison of corresponding G<| and G2 rules. 
O 

ER^C Every time a discrepancy between G 1 and G 2 higher order rules 

in 
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• may lead to a violation of the Code-switching constraints) we must first 
identify the constituents which risk being involved in this violation We 

\ then incorporate lexicalization restrictions in the terms of the higher 
\ * order rule, restrictions which carry with them certain heritability con- 
ditions to ensure that lexicalization is carried out appropriately but is 
not overly constrained. This entails a proliferation of categories and ' 
rules in the grammar, but does not interfere with its context-free na- 
ture. Note that the restrictions are a function of the similarities and 
differences between the two languages involved, and derive only from 
the equivalence and free morpheme constraints and not from any o#er 
t purported universal syntactic properties of VPs, for example. 
>. \ 

9. PROBABfLISTlC^GRAMMARS 

» In the remainder of this paper, we will analyze the syntactic 
aspects of code-switching heard in the speech of Puerto Rican bilin- 

* guals. Though the context-free grammar for the code-switching mode- 
described above may well account for the types of switches allowed and 

- those excluded in this corpus, it cannot by itself capture many of the 
other regularities observed in this type of discourse. In particular, and 
it shares this inability with any generative grammar when confronted 
with performance data, it cannot account for the many striking quanti- 
tative patterns eviden) in the discourse. ° 

A grammar will, however, generate the quantitative structured 
a language as well as its qualitative or categorical aspects, if a suitable 
probabilistic component is added to the generative machinery. Con- 
text-free grammars are easily probabilized, as noted years ago by e'.g. 
Klein (1965) and Grena'nder (1967). Probahrflistic context-free gram- 
mars have been used to study style-shifting (Klein 1965), first language V 
acquisition (Suppes 1970), grammatical inference (Horning 1969; 
Sankoff 1971, 1972), the acquisition of German by migrant workers 
(Heidelberger Forschungsprojekt "Pidgin-Deutsch" 1978f Klein and 
Dittmar 1979), and differences in noun Wirase structure in written and 
spoken English (Hindle 1980). 

o . • .13 . . 

ERJC ln tms sect' 011 we will discuss the relationship between the 

• ■ ~ . . 
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probabilistic context-free grammar generating and L 2 monolirw 
speech on the one hand, and the probabilistic code-switching grafnniar 
on the other. This will serve as a conceptual framework for the analysis 
in the next sections. 

- " . • . \ • 

• The key to tfce probajbilization of a context-free grammar is that 
when a node of a given category is to be rewritten/the choice of rewrite 
rule is made according to a set of probabilities over all possible rales for 
rewriting that category, and is made independent of all otHer choices of > 
rewrite rules in tfle derivation. Thus if the only ways to rewrite Nf> in 
a grammar were summarized by * 

NP-*(DET) N (ADJ) f % ' 

then each of the possibilities NP -*N, NP — *DET N, NP-»N AD J and 
NP DET N ADJ would be assigned.a probability, i.e. a number be- 
tween zero and on£, in the definition of the grammar, and these num- 
bers, would have to sum to one. Then every time an IMP' was to be 
rewritten, a random (not to be confused with equiprobable) choice 
among the, four possibilities would be made with each one's chances 
of being chosen equal to its associated probability^ A similar set of 
probabilities would exist for the rules rewriting S f another set for VP f 
and so on. « 

For a given context-free grammar the rule probabilities can be- 
estimated by examining, a sufficiently large corpus, or sample of the* 
generated language, parsing each sentence, and counting rule frequen- 
cies. If there are ambiguities, more complicated procedures are neces- 
sary (Sankoff 1971, 197^. . ] 1 

Our conditions in the previous section on the translatability 
of categories of G 1 and G 2 mean that they are essentially two probabil- 
istic cbntext-free grammars using the same set. of symbols, and this led 
to a natural definition of the" code-switching grammar. Complications 
arise when we come to probabilize the rules of this new grammar. How 
^ ie probabilities associated with the rules of G 1 and G 2 combined 
yKJK-oduce the probabilities of the rules in the code-switching gram-^ 

,_ 2fl J . 
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mar? To answer this question will require a great deal of empirical 
research. Once sufficient data analysis enables us to establish the 
mechanism for combining probabilities, this mechanism wjll be the key 
^to truly integrated deductive/inductive research on the relationship 
between probabilistic monolingual and code-switching grammars. That' 
^ the statistical properties of the code-switchingVammar will not only- 
be^empirically observable in codaswitching discourse, but will also be 
pi-edi&table from the statistical properties of the monolingual grammars. 
The format of the data available to us, however, and the preliminary 
natunyjf this exercise, permit us access to code-switching statistics 
only by directly examining code-switching discourse/and not by deduc- 
tion from the monolingual grammaH. For the present we can only 
speculate on the details of the probabilistic mechanisms involved in 
combining grammars. 



5 



The simplest hypothesi|^taWaccount of the observation that a 
given stretch of code-switching discourse is characterized by a certain 
proportion of and a certain proportion of L,. These proportions 
are sensitive, among other things, to, the bilingual ability of the speaker, 
and the-nature of the interlocutor, situation and topic, bufceven with all 
such factors held constant, basically monolingual stretches alternate 
jyjlb-/stretches of- high code-switching density, as mentioned in sec- 
, tion 4. ' . ' " * . * 

The hypothesis would have rules for rewriting a category irf the 
code-switching grammar choserf at random from the eligible rules in G., 
. and S 2 , with the probabilities being a compromise betjg«eh the proba- 
bilitiesin the two monolingual grammars, weighted^cording to the 
proportion of L 1 and l_ 2 in the overall discourse. (There would 'be 
exceptions, of codrse, especially-.when certain superscripted categorife 
were rewritten.) It seems likely, however, though this would need to . 
, be verified mathematically and experimentally, that this choice mech- 
anism would yield far more multiply switched sentences than are 
empirically observed. To circumvent this difficulty, it will probacy be 
necessary to allow S P or en 8 superscripts on some phrase structure 
t O ' « aside from those discussed in section 8. When a node is to be 
jzKJC'tten, each sub-category will be superscripted in the. same way (or 
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each lexicalization would be in the same languageK Depending on the 
prevalence of such superscripted nodes, we can obtain rates of code- 
switching per sentence more in accord with observed tendencies* The 
complete solution of this problem awaits further quantitative research, 
but later we will present empirical evidence that the code-switching . 
grammar probabilities do represent compromises between the two 
■ 4 monolingual grajnmars.' ji 

* Why ar^ the rule probabilities of the code-switching grammar 
so important? It is because the probabilities in a context-free grammar 
determine ALL the statistical aifid quantitative properties of the lan- 
guage it generates. In particular, they completely determine the prefer- 
red locations and frequency of code-switches within the sentence. And 
it is the comparison of these theoretical predictions with the type of 
observations and calculations in the next section .which is the most 
promising way of verifying a formal syntax of code-switching. 

10. CODE-SWITCHING FREQUENCIES AND RATES 

In the study of code*switchihg it does not suffice to document 
the^rarity of exceptions to purported syntactic constraints in order to 
prove them. For example, Timm (1978) attempted «to validate the * 
universality of the syntacticlconstraints she cSarlier (197^) felt to be 
valid for Spanish-English switching, by counting the exceptions to these 
constraints in Russian-French code-switching discourse in Tolstoy's 
WAR AND PEACE. For most of "the constraints conjectured she found 
only a few exceptions. However, since she does not indicate how much 
code-switching discourse is contained in the opus or how many code- 
switches there are in all, or how many are intra-sentential, the signifi- 
cance of the exceptions cannot be assessed. 

Previous quantitative studies (Pfaff 1975, 1976; Poplack 1978, 
1979a) have been more revealing in showing what proportion of dode- 
v switches involved nouns, what proportion determiners, etc. Even this, 
however, does not jjive a clear indication of the quantitative effects of 
cn i /He context on code-switching. Just because single nouns, fopo 
asE^E^B, were found to constitute 14% of the switches, ,while predicate 
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adjectives made up only 3%, this does not necessarily mean that nouns 
are more likely to be switched than predicate adjectives. Perhaps nouns 
occur 5 or '10 times more often in discourse than predicate adjectives. 
To estimate the true relative susceptibility of a syntactic boundary as a 
code-switch site, we divide the raw frequency of switches at each type 
of boundary by the frequency of occurrence of this boundary, in the 
tode-switcher's discourse. > 



Thus we undertook to estimate the-otferall occurrence rate of 
various constituent boundary .types in typical discourse containing 
code-switches. Isolated, in a series of recorded conversations with a 
balanced bUingual speaker, some 30 stretches of discourse containing 
code-svyitchW The one or more sentences in each stretch were parsed 
using a limited number of "Syntactic categories, as in (17). 

, 

(17)" Y en Puerto Rico he would say que cortaba cana, even 
though terrfa su negocio, you know. [And in Puerto Rico 
he would say that he cut, cane even though he had his 
own business, you know.] (S.L./32[ 

* - ■ - ; ' . " . ' 

See Diagram on Page 25. * 

In accordance with the discussion at the end of section 9 above, 
we also attempted to infer which nodes of the phrase structure tree 
could be unequivocally identified as to language. The following criter- 
ion was adopt§£p^ whenever a node dominated only Spanish lexical 
terms, the rule rewriting it was classified as a Spanish rule, and analog- 
ously for English rules. The remainder, those that dominated both 
Spanish and English (in example 07), the S, VP and ADVB'L nodes) 
lexical items, were listed Separately as most representative of the code- 
switching mode. / 

This manner of identifying node languages applies more widely 
than thp language choices required by the code : $witching constraints 
djscussed in sections 7 and 8. In example (17), ttfe only superscripts 
m^y^-iosed by the equivalence constraints would be s P :v on all nodes of 

^£yj^ VPs cortaba cqyq^and tenia su negocio, reflecting the impossibility, 
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of subject pronoun absence in English in this context. As argued in sec- 
tion 9/the additional specification of language in some nodes not invol- 
ved in code-switching constraints may help to better account for 
observed rates of code-switching. Although in parsing we can easily 
identify the node*, further research will be necessary before we can 
suggest a probabilistic mechanism for the choice of such nodes in the 
generation process. 

11. RULE FREQUENCIES 



Using the surface phrase structures obtained from the parsing 
procedure, we were able to tabulate (a) the frequency^ the .various 
rewrite rules used in generating the sentences, and (b) the frequency 
of constituent boundaries of various types* . « 

As in the previous section, we point out that theoretically, we 
would to use the estimates derived from the Spanish only and 
and English ohly data to predict rule probabilities in the code-switching 
mode. Further, we would like to predict the frequency of various con- 
stituent boundaries as well as the switch frequency at these boundaries, 
jrhese predictions could then be compared to the empirical results with 
a view to further refining the theory. ^ 



discussed in the previous section, | however, neither our 
knowledge of the generative machinery, nor these preliminary data, are 
sufficient for detailed inference based on a probabilistic context-free 
grammar model. In Table 1, however, we can make some inter-code 
distinctions by separating rewrite rules applying^to nodes identified 
as English and those identified as Spanish. The reJtratTTfler are listed 
under 'code-switching'. Certain differences are obvious in the Table. 



The most, striking distinction irr Table 1 is the tendency 'for 
English sentences to be derived by a S— * NP VP rule followed by 
NP— >PRO, whereas in Spanish the dominant tendency is for S— >VP, 
^ jwed by NP^(DET) N. This difference reflects the prevalent opt- 
bty>>for subject pronoun deletion in Spanish, p • 
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ENGLISH SPANISH 



NFVP 



S->(TAG) VP (TAG) 

s«^(conj)np r 

S->VPNP (TAG) 

{TAG . 
ADVBTL j> 

S-^NP S (TAG) 



NP-*PRQ 
NP-=T(DET) N 
NP^(DET) ADJ* N 
NP—>(DET) N ADJ'L* 
NP-JNP S 
NP->NPC0NJ NP 



vp ^ (mod) vnp (aovbx) 

[prep phr\ 

VP^HA°V) (MOD) V JMft'L | 



m. 
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38 
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S 
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PREP PHR 



I* 
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62% 
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9 
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VP ->PRO*+ (MOD) V (NP) 
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0 


VP NEG VNP 
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6 
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VP ->(AXJX) V (SUB CONJ Sj 
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(n = 32) 


(n = 17) 


(n = 14) 



Tabic 1. Probabilistic phrase structure grammars. S, NP and fa rewrite rules for English, Span- 
ish and code-switching modes. < 
rjj^^ * One or more constituents of this type On * ' 

£jv>^ + These are object NPs, pronouns, or reflexive clitics. ^ 
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In the MP rules we note a difference between adjective or adjec- 
tival placement in Spanish and English. This is only a quantitative 
difference-though most Spanish adjectives must be postposed some 
may precede the noun, and both English and Spanish adjectivals follow 
the noun. 

In the VP rules, we note the difference between Spanish and 

I fl'f h lAr V e rb aUXi ' iarieS 3nd negation ' and in the Position 
o the object MP. In Spanish but not English, the object IMP is option- 
ally preposed, and obligatorily preposed in many cases when it is pro- 
nominahzed: 

There is a general tendency for the numbers in the code-switch- 
ing column to resemble the English figures in some respects and the 
Spanish in others. The exceptions result from two factors. One is 
simply statistical fluctuation due to the sample size. The other more 
important, is the apparent elevated tendency for recursive rules' invol- 
ving subordination and conjunction to be employed in the code- 
switching mode when rewriting S, NP ar$A/P. This latter tendency is 
probably largely due to the fact that those rules used late in the deriva- 
tion, containing few embedded constituents, were most likely to be 
clearly monolingual, i.e. Spanish or English, while those rules used 
earlier, generating enjoined and subjoined structures, dominated many 
more constituents and were thus more likely to dominate constituents 
of both languages, so that they could not be inferred to be drawn from 
either, the Spanish or the English grammar. 

An important conclusion to be drawn from this part of the 
exercise is that even in those portions of discourse in close proximity to 
one or more code-switches, the speaker is strictly maintaining the 
qualitative and quantitative distinctions between the Spanish and 
English grammars. Whenever a stretch of, discourse, no matter how 
short, can be clearly identified as monolingual, the rules of the appro- 
priate monolingual grammar, and their associated probabilities are 
exclusively in play.^ 
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Along with the rewrite rules discussed in the previous section, 
the parsing exercise also produced frequency counts of constituent 
boundaries of various types in the 30 discourse stretches analyzed. In 
this section, we combine these data with the results of PQplack (1979g) 
on the observed frequencies of switches of various grammatical categor- 
ies, in order to evaluate switch rates, i.e. the propensity of given syntac- 
tic boundaries to be the site of a code-switch. 

Had the latter data been in terms of switch frequencies at the 
various constituent boundaries, and had the two data sets been com- 
piled on exactly the same corpus, it would have been an easy matter to 
divide the switch frequency at each boundary type by the frequency of 
that boundary 1 type, and hence, to derive the switch rate for that type 
of constituent boundary. 

But because the 1979a data were compiled in terms of the 
grammatical category of the switched item, we first had to convert 
them to boundary terms by cross-tabulating the category of the switch- 
ed item with the categories of the preceding and following items. 

And because the corpus for the category frequency data was 
not identical to the corpus for the boundary frequency data, dividing 
the former (converted from category to boundary terms) by^ the latter 
does not give the switch rate, but a number which must be multiplied 
by a certain factor to obtain the switch rate. This factor is largely 
determined by the relative size of the two corpora, and remains con- 
stant for all boundary types, since the same disproportion between the 
two corpora holds for the data from each type of syntactic boundary. 

This means that even if the numbers obtained by dividing 
switch frequencies by boundary frequencies are not the actual switch 
rates, they are all proportjonal to the 'true' switch rates by the same 
constant of proportionality. 

In any case, we have already noted that in code-switching dis- 
rnip, rates are by no means homogeneous, either from situation to 
i«k.i i i mn f or from speaker to speaker. Thus, dividing switch frequen- 



30 DAVID SANKOFF & SHANA POPLACK 

cies by boundary frequencies for the whole corpus, including the large- 
ly monolingual parts, would have produced rates too low for stretches 
where switches are dense, and too high for stretches where they are 
rare. (Again, however, 'too high' and 'too low* would apply uniformly 
across all boundary types, so that if the estimated rates are not really 
applicable to a given stretch q# discourse, they are all proportional to 
the true rates.) 

Moreover, given that a speaker's propensity for switching dif- 
fers according to both extralinguistic factors and the specifics of the 
given conversational^ interaction, the calculation of absolute, or univer- 
sal, switch rates does not seem to be a very meaningful goal. But since 
we cannot expect any interaction between these extralinguistic factors 
and the boundary types affected by switching (with one exception to 
be discussed below), changing the situation will change the switch rates, 
but only in a proportional way across all boundary types. 1 

In sum, our primary goal must be to calculate not the switch 
rates themselves, but the ratios between the switch rates at various 
syntactic boundaries. As the situation Changes, or the speaker changes, 
or even from one stretch of conversation to another, the switch rates 
will all change, but will remain in the* same proportion to each other. 
Thus we need not be overly concerned about the fact that our calcula- 
tions only produce figures proportional to code-switching rates rather 
than the rates themselves, since it is only the propprtionality among 
the rates which can hold throughout a discourse, from speaker to 
speaker and from situation to situation. 

Thus in Table 2, we show the RELATIVE propensity for each 
syntactic- boundary type to be the site of a switch, using the formula 
in (18). 

number of switches 
, (18) code-switch rate at a = con stantx at boundary 

given syntactic boundary frequency of 

boundary 
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fable 2, a constant was chosen in an effort to obtain the probabil- 
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ity of a code-switch at a given syntactic boundary in a typical stretch of 
code-switching discourse. The figures in Table 2 are, as we have stress- 
ed, meaningful in a proportional sense only, i.e. they mqy all be too 
high or too low by a constant factor, and this factor will change from 
situation to situation, and from speaker to speaker. 

See Table 2 on Page 32. 

Table 2 shows that constituent boundaries are clearly subject to 
a hierarchy ranging from very high propensity to be the site of a switch, 
to total absence of switching. We remark first that prohibited switch 
sites are precisely those in the vicinity of which the number and/or 
order of sentence elements generated by a given rule is excluded in one 
of the two languages, i.e. those which violate the equivalence con- 
straint. Included here are constructions involving NEG placement, 
which in Spanish directly precedes the main verb, as in (19), while in 
English it follows an auxiliary or a modal as in (20). 

(19) An' the second one, I seen everything 'cause no cogina' 
[ I didn't take anything] . (S. L/1 ) 

(20) La anestesia [the anesthesia], / didn't take it. (S.L./2) 

Also included here are constructions involving reflexive and 
object pronoun clitic placement, which in Spanish precede the verb, 
as in (21), and in English follow if they appear in the surface struct- 
ure at all; similarly, for Spanish constructions in which the subject 
NP follows the verb, as in (22). Switches in these examples may occur 
around, but not at, the boundaries in question. 

(21) This one, he doesn't wanna eat casi, right? Se le da un 
dolor de barriga [he gets a stomach ache] . Hq gets a 
lot of stomach pains. (S.L./4) 

I really been in here, whfch querfa Juan [Juan wanted] 
you know, desde [since] nineteen seventy two. 

(S.L./28,29) 

30 
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. SWITCH SITE RATES 

Between tag and preceding or following category 40 % 

Between ADV and ADVB'L and preceding or following 

category 5-10 
Between PRED ADJ and preceding category 15 
Between DET and N or NP 13 
Between coordinate conjunction and PRECEDING 

category g 
Between subordinate conjunction and FOLLOWING 

category 

Between v VP J and (JIPJ 2.7-3.6 
Between coordinate conjunction and FOLLOWING 

category o 9 

Between ADJ and 7.NPJT o o 

J-Nl rv-i 

Between V.NPJ and LVp/ 2.3 

Between PREP and FOLLOWING category • 2 3 
J V l 

Between X.VPJ and PREP PHRASE 2 3 

fAUXl /V\ I* 

Between \moD) and (VPJ 9 

Between PREP PHRASE and ADJ'L (except after / VP r" 
and PRECEDING calory > ^ 

Between subordinate conjunction and PRECEDING 
category 

Between pronoun and preceding or following category 
Between clitics and V 
Between AUX and NEG 

Between NEG and \MODJ 0 
Between VP and subject NP Ox 0 

ERJ.C Table 2. Code-switching^rates at different syntactic boundaries. 
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' Note that these restrictions stem only from the differences 
between two languages involved in the code-switching mode. Published 
data on French-Italian, for example, which both make use of equivalent 
rules of clitic'pronoun placement, include a switch between clitic and 
verb: si sent 'S/he feels' (di Sciullo et al. 1976). 

At the other extreme, the greatest propensity to switch is 
shown by the category TAG; both when it precedes and follows each 
of the 16 other syntactic categories studied, as in (23), for example, 
and despite the fact that this segment occurs relatively rarely in non- 
code-switched discourse. 

(23) Yo estaba aburrecido, muriehdome, you know? [I was 
dying of boredom, you know?] (C.B./28) 

This reflects the fact that tags are subject to minimal if any syntactic 
restrictions and so may be swit^ied easily without fear of violating the 
equivalence constraint. Indeed, switches of precisely this category were 
found (Poplack 1979a) to characterize the discourse of non-fluent 
bilinguals, allowing them to participate in the code-switching mode 
although they lacked the bilingual ability in l_2 to engage in fnore com- 
plex switching. 

If any boundary types do not obey the proportionality relation- 
ship discussed above, it will be those involving tags. Thus for certain 
speakers, switches involving tags will be increased dramatically, while 
those involving other constituents will not only not increase propor- 
tionally, but may even decrease. The 40^f igure attached to tags may 
be somewhat exaggerated relative to the other rates because the 'cate- 
gory frequency' data on which they "are based contained many tags 
switched by non-fluent bilinguals who engaged in little other intra- 
sentential code-switching. 

Another favored switch point is before a predicate adjective. 
Tu jj reference contrasts sharply with the restrictions against switching 
gJMQn the non-equivalent noun + adjective or adjective + noun 
Kmaajries to be discussed below. ^ o 
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The point between determiner and noun wiu > be < the site of a 
switch about 13% of the tfme according to these calculations, a finding 
reflecting the great susceptibility of nouns not only to be borrowed, 
. but also to be switched, as is widely noted in the literature (e.g. Wein- 
reich 1953, Gumperz 1971, Timm 1975, Wentz 1977). 

Finally, adverbs and adverbial phrases, both preceding and fol- 
lowing the other constituents studied, are very likely to be switched, 
with a rate of 5-10% depending on the specific constituent with which 
they are combined. This again reflects, thQugh not as strikingly as for 
tags, the large number of slots these categories may occupy (as in (24), 
for example) withjn the sentence without fear of violating the equival- 
ence constraint. 

(24a) A los cuatro meses [at four months] they start munching 

on some rice and Beans, (S.L./8) 

(24b) Unb no podia comer carrie [we couldn't eat meatj every 

day. (S.L./20) f 

o 

Conjunctions and prepositions show an interesting pattern of 
asymmetries in these data. Coordinate conjunctions tend to be in the 
language of the following constituent, as evidenced by the high pfopari- 
sity to switch before such constituents in contrast with an average 
propensity to switch after them. Subordinate conjunctions ancr pre- 
positions, however, tend strongly to remain in the language of the head 
element on which they depend, and it is the remainder of the depen- 
dent clause which is switched. This switch rate would seem to tie in 
with Gumperz' (1976) constraint requiring that the conjunctions be in 
the same code as the conjoined sentence, at least insofar as coordinate 
conjunctions are concerned. Why coordinate and subordinate conjunct- 
ions should behave distinctly in this regard, however, is not immedi- 
ately apparent. I^Jor is the data conclusive. Examples such as (25) are 
not rare. For the moment, then, we must allow for the possibility that 
the quantitative patterns are due to sparse data. f& 
rnir 

L ,,„Xr- 4^ (25a) I could understand que [that] you don't know how to 
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Speak Spanish,4verdad? [right]? (S.L./75) 

(25b) Right to 104th Street donde tenia una casa [where I had 
a house] which were furnished rooms. (S.L./25) 

(25c) Any kind of book that's interesting, abou^ Mafia 0 [or] 
love story o sex books or things like that. u . 

If the tendency of switches to occur after and not before pre- 
positions and subordinate conjunctions^ is borne out, however, this 
would dispel any identification of higtTorder constituent boundaries 
with ease of switching, and constituents linked by late rewrite rules 
with resistance to switching: prepositions and subordinate conjunct- 
ions are both linked at a higher level with their header categories than 
with what follows them. 

The boundary between verb and following object IMP shows a 
somewhat higher switch rate than that between preceding subject NP 
and following VP, though both types of switches are far more frequent 
than any before or after a subject pronoun. Indeed, it is precisely the 
very low propensity of subject pronouns to be switched which explains 
why scholars have posited categorical constraints against switching 
them (e.g. Timm 1975, 1978; Gumperz 1976), and which most clearly 
illustrates the utility of a quantitative approach to the study of code- 
switching. 

We remark that a large proportion of syntactic boundaries are 
< affected by the same, intermediate switch rate of approximately 2.2 - 
2.3%. Now, if the equivalence and bound morpheme constraints were 
not only qualitatively but also quantitatively the only constraints on 
code-switching, we would expect all switch rates for all boundaries to 
be the same. And indeed, apart from the especially susceptible types, 
largely involving freely moveable constituents, and the very low fre- 
quency types, which in some cases seem to approach being morphemic 
than syntactic boundaries, all other constituent boundaries 
ERyKT switches at a rate proportional to the frequency of these bound- 
wzwmj monolingual speech. 3 4 
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« • We find that even the boundary between adjective and noun has 

an intermediate switch rate, i.e. the propensity for this boundary to be 
the site of a switch largely reflects its frequency of occurrence in non-^ 
switched discourse. This is somewhat surprising since most Spanish * 
adjectives do not follow the equivalent word order, as may, be seen in 
,(26). 

• (26a) No coge la estaci6n latina. [It doesn't get the Latin 
station.] (W.B./23) 

(26b) Because they're Spanish people. (W.B./62) " 

Many do, however, and at any rate, this switch site has already been 
shown (Poplack 1979a) to represent the majority of the few attested 
violations of the equivalence constraint. * 

Showing a relatively low propensity t£ be the site of a switch is 
the point betvyeen auxiliary or modal and verb* which again explains 
why categorical constraints have been posited (Timm 1975, 1978) 
against switching here. 

13. DISCUSSION 

In constructing a formal apparatus as a framework for the em- 
pirical exercise, several points emerge. The code-switching constraints 
are surface phenomena and cannot be naturally generated in deep struc- 
ture. Phrase structure grammars for L 1 and L 2 can be combined to 
form a code-switching grammar which generates grammatical mono- 
lingual sentences as well as those containing only valid code-switches. 

? Turning to the data analysis itself, we find that. rule ftr*obabili* * 
ties for the code-switching grammar represent a compromise between 
G 1 and G 2 probabilities, but the details of this compromise remain to 
be investigated. Finally, the switching propensities for various syntactic 
^"idaries yield a clear and simple picture of syntactic effects on code- 
R Inching. For- most boundary types, switches occur with a rate pro- 
uingraional to the occurrence of the bound^rV type, j Freely moveable 
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constituents havd more switches at their boundaries, while boundaries 
between constituents which are highly constrained to occur together, 
approaching the" status of 'bound morphemes, are more resistant to 
swjtches. * 

We do not claim ^perfect accuracy for all the figures in Table 2, 
given the size of our sample, possibilities of incompatibility of the two 
^corpora used, and the rough nature of the syntactic analysis. Neverthe- 
less, their interpretation is quite clear. The equivalence and free mor- 
pheme constraints extend quantitatively to performance data: not only 
are all boundaries which satisfy the equivalence constraint eligible for 
code-switching, but most are equally LI KELY to be the site of a switch. 
Those exceptional boundaries which show a relatively low rate of 
switching involve two closely bound syntactic elements whose relation- 
ship appVoaches, but does not quite enter, the domain of the free 
morpheme constraint. This quantitative approach permits an analysis 
which accounts for more of the data and is more scientific than the 
constraint-and-exception paradigm which has characterized the code- 
switching literature. 

^fiffhe extent that the code-switching constraints, both in their 
qualitative and quantitative aspects, are validated by this and future 
studies, they may prove to be useful tools in the study of monolingual 
syntactic structure. We have already seen, for example, that the free 
morpheme constraint prohibits switches categorically only between 
truly bound forms, but that it operates in a weaker way between forms 
which are closely linked but not clearly bound. We may now reverse 
the argumentation and make use of this fact to evaluate the status of 
binding relationships between morphemes in montftfhgual speech. If 
two supposedly bound morphemes in a language are investigated in a 
code-switching situation and found never to be separated by a code- 
switch, their bound ftatus is confirmed. If their boundary is suscep- 
tible to switches, but only at a low rate, we may say they are weakly 
bound, arid so on. 

<^ 'Simharly, for the equivalence constraint, where there is some 
£*VV »n over the rules generating a certainties of structures in mono- 



38 



DAVID SAIMKOFF & SHANA POPLACK 



lingual speech, an investigation of the proposed syntactic boundaries in 
the code-switching situation may help clarify the situation. For ex- 
ample, Spanish proposed objects may be generated in two ways: dir- 
ectly in the verb phrase, as in (27), or by topicalizing extraposition, as 
in(28). * ~* i 

(27) VP-*NPV 

Ellos algata mataron. 'They killed the cat/ 
(2jB) S->NPS 

Algato, ellos mataron. 'The cat, they killed/ 

Since subject pronoun deletion is common in Spanish, both 
(27) and (28) reduce to (29) : 

(29) Algato mataron .-'They killed the cat/y 

For a speech variety wher/* sentences like (29) are common (not the 
case for Puerto Rican Spanish), an investigation of the possibility of 
switches into English between goto and mataron would be diagnostic of 
the syntactic structure. If (29) has the same structure as (28), such 
switches would be common. If the structure is like (27), they would be 
prohibited, since English cannot prepose object NPs in the VP. 

A thirct area where code-switching may be an indication of syn- 
tactic structure is in evaluating the relative importance of constituent 
hierarchy and lexicon in the structure of sentences. For example, pre- 
positions (or subordinate conjunctions) introducing a verb complement 
mrff be heavily constrained lexically, i.e. by the verb in question. Irf 
the constituent hierarchy, however, those items will be more closely 
. grouped with the other elements of the complement jthan with the verb. 
The possibility that switches occur more readily after prepositions and 
subordinate conjunctions than before, may reflect the greater weight of 
y on-controlled constraints than constituent hierarchy relationships. 
fcKJv may be especially true if a verb required different complement 

■ .. 37 . . 
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structures in the two languages. 

The evidence we have presented for the syntactic integrity of 
Spanish and English grammars* even when they are being used sequen- 
tially and simultaneously, bolsters other arguments for nonconvergence 
of Spanish and English in the Puerto Rican speech community. A 
quantitative semantic analysis of tense and aspect (Pousada and Poplack 
19^9) and morphophonological analysis of word-final inflections 
(Poplack 1980) in the same community have also shown that the 
grammar of Spanish (aside fi6fn the lexicon), which serves a wide 
range of communicative functions, has been extraordinarily resistant 
to influence from the grammar of English; this despite the ecorieWc 
and political dominance of the English-speaking community. ) 

This integrity of the monolingual modes of discourse in the 
community clearly puts into relief the special nature of the code- 
switching mode as a distinct communicative resource for skilled bi- 
lingual speakers. This mode, which is not to be confused with borrow- 
ing or other language contact phenomena, is governed by a well-defined 
set of syntactic rules. We have shown its structure to be accessible 
through the scientific study of speech performance in much the same 
way as monolingual varieties. 

FOOTNOTES- 
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ER]C B situation with trilingual code-switching gives rise to an 
fiBHsajns, somewhat more complicated. 3 G 



alogous 
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This includes the transformational grammar approach of Barkin and 
Rivas (1979), despite their concern for keeping the monolingual gram- 
mars separate while generating the set of code-switched sentences. 

4 The difficulty in constructing an example with less shaky postulates 
is a consequence of the shrinking stock of transformations now recog- 
nized by generative grammarians. 



5 ^ 

This may seem like an uneconomical procedure. Why put s P :v super- 
scripts on rule output categories which never dominate verbs? Would it 
not be preferable to limit the number of categories in the code-switch- 
ing grammar distinguished only by irrelevant superscripts? .The answer 
is yes-for any particular code-switching grammar. It is a iWatter to de- 
. termine which categories can dominate a V, which an N, and so on, in 
English and Spanish. But for an arbitrary code-switching grammar, 
this means devising an algorithm to determine exactly which non- 
terminal categories may dominate which terminal categories in a poten- 
tial equivalence constraint violation. This should not be. difficult and 
may well be preferable, but to keep the present already complicated 
exposition as short as possible, we omit the discussion of such an algori- 
thm, at the expense of a proliferation of superscripts. 

e 

For example, even in the present case of Spanish-English code- 
switching, it seems probable that it will be necessary to include certain 
'hybrid' rules. Here the first half of the rule output will reflect a 
strictly Spanish pattern, say, while the second half will be purely 
English, but there is no constraint against switching between the two 
halves. This is a complication in detail only, and we will not discuss 
it further here. 

7 * 

Certain boundary types appear collapsed in the Table, e.g. the four 
combinations between N or NP and V or VP, because of differences in 
the boundary frequency calculations and the coding of the original 
data: although only NP VP boundaries are generated by our code- 

FRir' hing 9 r a mmar » some switches had previousfy been coded N VP 

El^ prNPV. 29 
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